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Abstract 

The energy landscapes of proteins have evolved to be different from most random heteropolymers. Many 
studies have concluded that evolutionary selection for rapid and reliable folding to a given structure that is 
stable at biological temperatures leads to energy landscapes having a single dominant basin and an overall 
, funnel topography. We show here that, while such a landscape topography is indeed a sufficient condition 
for folding, another possibility also exists, giving a new class of foldable sequences. These sequences have 
landscapes that are only weakly funneled in the conventional thermodynamic sense, but have unusually low 
kinetic barriers for reconfigurational motion. Traps have been specifically removed by selection. Here we 
I | examine the possibility of folding on these "buffed" landscapes, by mapping the determination of statistics 
of pathways for the heterogeneous nucleation processes involved in escaping from traps to the solution of 
an imaginary time Schroedinger equation. This equation is solved analytically in adiabatic and "soft-wall" 
approximations, and numerical results are shown for the general case. The fraction of funneled vs. buffed 
proteins in sequence space is estimated, suggesting the statistical dominance of the funneling mechanism for 
achieving foldability. 
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Introduction. The mechanisms of protein folding are manifold. This diversity can only be captured by 
a global statistical survey of the protein's energy landscape. What can be said about this landscape merely 
^) from the knowledge that proteins fold? 

, , A foldable protein is a heteropolymer which through Brownian motion finds a particular structure within 
a short time, and once this structure is found, essentially stays in that structure. This requirement is both 
thermodynamic and kinetic. Thcrmodynamically, the folded state of a protein must be stable and tolerably 
unique. While conformational substates can be functionally important and are decidedly present [1], the 
specificity of proteins required for their work in the cellular network leads to strong constraints on the 
structure of the active sites and binding sites of proteins. 

For random heteropolymers, thermodynamic foldability depends on the relation between the physiological 
. temperature and the glass temperature, which depends on the amino acid composition. For some compo- 
sitions, e.g. near homopolymers, T G is low, while for other compositions using many chemically distinct 
amino acids, T G could be higher than physiological temperatures. If physiological temperatures are very 
O ■ high, only a small fraction of all sequences will be foldable in a thermodynamic sense, because few sequences 
\ would have sufficient stability in the ground state conformation against entropic costs. At sufficiently low 
temperatures, all sequences would have stable ground states, providing the solvent is such that interaction 
free energies do not change much as temperature is lowered. But in this low temperature case, the kinetics 
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of folding would become the controlling factor. For a typical random sequence of amino acids, while the 
lowest energy state becomes thermodynamically stable only below the glass transition temperature T G , at 
these low temperatures several other traps will also become stable. The average escape time from these traps 
scales exponentially in chain length. Thus for long enough chains folding will be slow. 

One resolution of this conflict between kinetic and thermodynamic constraints is for proteins to have 
evolved to have compositions with T G < T, but to be unusual sequences that are nevertheless stable in some 
configuration above T G via the introduction of an energy gap in the density of states, so that one energetic 
basin dominates. The energy landscape is funneled, with a single dominant basin. According to this view, 
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proteins are atypical heteropolymers, by having their folding transition temperature T F above the glass 
temperature T G ordinarily to be expected from the overall amino acid composition. This is the quantitative 
form of the principle of minimal frustration that was put forward by Bryngelson and Wolynes [2] . 

An extreme limit of the principle of minimal frustration would be for there to exist perfect consistency of 
interactions in the native state, an idea due to Go [3] . In the space of all sequences, such perfection is less likely 
than merely achieving a satisfactory level of resolved frustration. Most microscopic potential models suggest 
natural proteins still exhibit frustration in their ground state. The existence of this frustration is buttressed 
by the observation that many proteins can be further stabilized by single site mutations. Nevertheless, the 
perfect funnel model does describe the kinetics of folding of many proteins as a first approximation, as seen 
by the good agreement of 4> values predicted using Go models with experiments [4] . 

The iconoclasts among us, nevertheless, must ask whether the achievement of a funnelcd landscape with 
T-p/T G > 1 is the only way the kinetic/thermodynamic requirements for folding can be met. Rather than 
finding sequences that increase the native stability under conditions where dynamics of a random sequence 
would be facile, we can imagine designing an abnormally trap-free landscape giving unusually high mobility 
throughout the misfoldcd parts of the energy landscape. 

In this case, heteropolymers having compositions giving T G > T would easily have a ground state that is 
stable, but it would be necessary to design out the barriers for escaping the deep traps, so that those barriers 
were atypically small. The density of low energy states would not be atypical in this case, rather what is 
atypical is the size of the kinetic barriers between the states. Such an energy landscape could be said to be 
"buffed" . Even on such a buffed energy landscape, the native state must still have some extra stability to 
avoid the time-scale problems of a random search. But the extra stability in this scenario need not scale 
extensively with system size, and may scale as a non-extensive fluctuation ~ N 1 / 2 or smaller. 

It is intriguing that some of the successes of theories based on the strongly funneled landscape would 
preserved for theories based on buffed landscapes. For example, because the ruggedness en-route to the 
native structure is reduced and the effects of transient trapping diminished, native structure formation (as 
measured through ^-values) will once again be dominated by polymeric properties. However there should 
be observable differences between buffed and funneled landscapes in stability and robustness. The native 
state on a funneled landscape is marginally stable with respect to a large ensemble of unfolded states having 
significant entropy The native state on a buffed landscape is marginally stable with respect to a few 
dissimilar low energy states that are kinetically connected to it. Thus the thermal unfolding transition on a 
buffed landscape should be less cooperative than on a funneled landscape. The folded structure and folding 
mechanism are also less robust on a buffed landscape, since mutations can more easily open up or close off 
regions of searchable phase space. These differences between the properties expected for buffed sequences 
and natural proteins suggest that natural protein landscapes in the main are funneled rather than buffed. 

Yet given the possibility of having a buffed energy landscape, the explanation for the prevalence of fun- 
neled landscapes must be sought in statistics and population biology [5] . The question we address is whether 
the fraction of sequences with buffed landscapes is comparable or larger than the fraction of sequences with 
minimally frustrated landscapes in the conventional sense. This is the task of the present paper. 

Nucleation. Both folding and trap escape can be thought of as the growth of a new phase into a pre- 
existing phase. This growth requires overcoming a barrier of finite height. According to classical nucleation 
theory, we can introduce a reaction coordinate N e describing the amount of new phase, < N e < N, where 
N is the length of the chain. The typical free energy profile as a function of N e is 

F(N e ) = F(0) - fN e + o-N* . (1) 

Here / is the bulk free energy gain per residue in the new phase and a is the surface free energy cost per 
residue. These parameters arc each a function of temperature. 

For a uniformly stable phase, the exponent z — 2/3. However in a sufficiently heterogeneous medium the 
interface between phases can relax to a lower free energy configuration which is roughened. The larger the 
nucleus, the more the interface can relax, and the smaller the surface tension. For a wide class of models in 
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3 dimensions in this regime [6,7], a — a /N^ 6 and the surface free energy cost scales as n}^ 2 rather than 

N^ 3 - This is also true for a random first order glass transition. 

Setting dF(N e )/dN e = gives the typical critical nucleus size, N* = Nnt = (za/f) 1/{1 ~ z) where 
< n F < 1. The typical free energy barrier height is 

= <t(1 - z) (za/f) z/(1 - z) = <r(l - z) nt z N z . (2) 

Thus the barrier arises from surface cost, and scales like N z . 

Generally for glassy systems at high enough temperature, trap escape becomes a downhill process, corre- 
sponding to a vanishing of the surface tension in the nucleation picture. 

Atypically rugged and funneled sequences. Suppose each amino acid were chosen from a pool 
with probability p° for residue type i, where 1 < i < m with to = 20 and X^Pi = 1- Shorter sequences 
may have compositions with probabilities of occurrence {pi} differing from {p°}, but as the chain length 
increases, pi approaches p°. The probability of a composition {rii}, < rii < N, chosen randomly from a pool 
with natural abundances {p°} is given by a multinomial distribution. We are interested in the likelihood of 
finding a sequence with an unusually rugged landscape, so we seek the value of the fluctuations 8{b 2 ) 2 in the 
energetic variance per interaction b 2 , as various sequences of length N are sampled. The conformational^ 
averaged variance per contact b 2 for a given sequence is given by J2Tj=i $ e ijPiPj ( sce C -S- rei - [8]), where 
Sefj is the variance of an element of the pair interaction matrix between residue types i and j, e.g. the 
Miyazawa- Jernigan matrix [9] . 

The average sequence to sequence fluctuation in fe 2 , ((5(6 2 ) 2 ), is obtained by calculating the fluctuations 
in the incorporation probabilities pi, which decrease with increasing chain length as ./V -1 . The result is 

A m 2 

(^ 2 ) 2 ) = ^ E 6e 2 j 5e 2 k p°(l-p°)p°p° k =^. (3) 

i,j.k— 1 

As chain length becomes large, P(b 2 ) becomes an increasingly sharply peaked function about the average 
(b 2 ). We apply the central limit theorem so that the probability P R a sequence has ruggedness greater than 
say b 2 , is just the integral of the tail of a Gaussian, i.e. an error function. Typically the interaction matrix 
and pool compositions are such that the thermal energy of the heteropolymer is well above the ground state 
energy, i.e. (6 2 ) is sufficiently small that the temperature T > T G where T G is the glass temperature in 
the random energy model (REM) [10]. Later we will be interested in special sequences that are sufficiently 
rugged that without buffing they would be kinctically trapped and thermodynamically stable in low energy 
states at biological temperatures. Because of the N dependence in equation (3), along with the fact that an 
error function has an exponential tail, such extra rugged sequences become exponentially rare with increasing 
system size (see eq. (16)). 

Given an ensemble of sequences with ruggedness b 2 , we now seek the fraction of these sequences with 
an energy gap larger than A. The relevant gap A is between the energy of the ground state conformation 
and the energy of the next lowest globally dissimilar structure. Hence the distributions of state energies 
P(E) can be well approximated by Gaussian random variables of variance Neb 2 , where c is the number of 
interactions per residue. If there arc il = c Ns ° total dissimilar conformations (all assumed compact with the 
same c, here s is the conformational entropy per residue), 17 — 1 of them must be A or higher in energy 
than a particular one. The particular ground state can have any energy but is most likely to have energy 
near the REM ground state for a collection of f2 states [10]. Any of the 17 states are candidates to be the 
ground state structure. Then the fraction of sequences with gap larger than A is given by the expression 

/oo / />oc \ O—l 

dEP(E) / dE'P(E')) wc-v^t (4) 
-oc \JE+A ) 
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where the saddle point solution has been taken to obtain the last expression, as worked out previously in 
rcf. [11]. Note that the smoother the landscape (smaller b), the smaller the fraction of sequences gapped 
to A. We assume here no selection in ruggedness for funneled sequences, i.e. funneled sequences have the 
typical variance (& 2 ), but have an atypical density of states. 

Nucleation for trap escape. The alternative of buffed energy landscapes is not allowed in a strictly 
mean- field picture of folding dynamics, where the landscape is determined by the global statistical properties 
of the mean-field solution. However, for larger proteins, folding and trap escape depend on the spatially 
local (and/or local in sequence) properties of the energy landscape [12]. For folding, these capillarity models 
resemble those of an ordinary first order phase transition. On the other hand, in capillarity models of the 
glassy dynamics for trap escape based on the theory of random first-order phase transitions, fluctuations 
in the height of escape barriers scale in the same way as the typical barrier height [7]. This motivates an 
estimate for the size of the ensemble of sequences with anomalously small escape barriers. 

For a given sequence, the density of states is parameterized by two important energy scales, the variance 
6 2 , and the energy gap A. 1 There are other features of the energy landscape however, not completely 
captured by the density of states, such as the barrier height between dissimilar configurations for a given 

sequence. In mean field theory the characteristic barrier height is largely determined from the density 
of states and the Hamiltonian. However in capillarity theory, the fluctuations in barrier heights are large, 

and F* can deviate considerably from F . 

Given a trap of energy E, the typical profile for growth of a nucleus involved in trap escape is given by 
formula (1) with F(0) replaced by the energy of the trap 

F(N e ) =E- fN e + oNl . (5) 

Given a composition with b, the free energy F(N) of the ensemble of untrapped structures is given by the 
total free energy F(b, T) at temperature T. The value of F(b, T) depends on a critical parameter 

b G = T^2s /c (6) 

which is the value of b where the system is sufficiently rugged to be glassy at biological temperature T. In 
terms of the total energetic variance B 2 = Neb 2 and the total conformational entropy S = Ns , when 
b < b G the free energy F(b, T) — —TS — B 2 / (2T) and when b > b G , F(b, T) equals the ground state energy 
— \J1SoB' 1 . Equations (5) and the free energy F(b,T) together determine the bulk free energy gain /. 

We approximate the surface tension as proportional to the estimate for the mean-field escape barrier per 
residue calculated by us earlier [13] on the correlated landscape. That calculation shows the surface tension 
is always intensive and vanishes above a critical energy E* or below a critical ruggedness b* . 

Atypical Buffed sequences. Eq. (5) is understood to be the typical profile for trap escape for a 
given overall composition. However fluctuations from this mean profile for various sequences will occur. We 
expect that these fluctuations can be relatively large, since reconfigurational barriers scale as iV 1 / 2 , but the 
distribution of a random process after N events of residues joining the growing phase has a width ~ N 1 / 2 . 
Just as we found rare sequences that were anomalously rugged or funneled, we are interested now in finding 
rare sequences that are anomalously buffed, with low kinetic barriers. 

Since the variance in interaction energies is b 2 , when the amount of new phase increases in a nucleation 
process by 5N e = 1 the change in free energy SF for a given sequence is chosen from a Gaussian distribution 



x Trap escape affects the prefactor to folding on a gapped landscape, or the approach to the ground state on a buffed 
landscape, so the relevant free energy profile only involves the variance b 2 . 
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with mean SF(N e ) and variance cb 2 . A particular nucleation process corresponds to a sequence of increments 
{-^{Ne = 1), ^(2), . . . , -§^{N)}, so that the probability P {F{N e )\ of a particular free energy profile 
F{N e ) is given by a path integral over N e . The probability of a free energy profile is analogous to the 
probability amplitude of finding a particular path for a quantum particle. To find the probability 4>{F^) 
that a nucleation path has an escape barrier from a given trap lower than some value F^ , and that does 
not allow intermediate traps deeper than F^ below F(b 7 T), we introduce an infinite square- well potential 
V (F(N e )\F*) to constrain the profile. V (F(N e )\F*) is zero as long as F(b, T) - F^ < F(N e ) <E + F^, 
otherwise it is set to oo. It serves as an absorbing boundary for any paths wandering higher than F^ above 
E or lower than F^ below F(b, T). The probability we seek can then be expressed using a Green's function: 

/- f N dN 1 ( dF - B? ( We ' \ 2 — V(F(N )\F*) 
VF(N e ) e Jo C L^H^ " N ' ) v ' >\ S(F(0) = E)S(F(N) = F(b,t)) 

(7) 

where the delta functions ensure that the paths must start and end at the trapped and untrapped free 
energies respectively. 

To calculate the appropriate fraction of nucleation paths <j){F^) that have escape barrier from a partic- 
ular trap no larger than Fmax = E + F^ — (F{b,T) — F^), we must normalize by the free propagator 
G F (F(b, T), N\E, 0) in the absence of absorbing walls, i.e. with V set to zero: 

<f>(F*) = G(F(b, T), N\E, 0)/G F (F(b, T), N\E, 0) . (8) 

Evaluating these Green's functions is made easier by using the quantum mechanical analogy. We recognize 
the term dF/dN e in eq. (7) as a time-dependent gauge transformation in one dimension, so we see that the 
propagator to free energy F after N e steps from the path integral problem satisfies an imaginary-'time' 
Schrodinger equation 

BC rh 2 B 2 G - F)C 

— (F,N e ) — — (F,N e ) + F(N e ) — (F,N e ) + V(F(N e )\F*) G(F,N e ) = 5(F E)S(N e ) (9) 

where F = 8F/dN e = — f + zaN e ^ z \ The situation is shown in Figure 1. Here — iN e plays the role of 
time, F the role of position, % = 1 and mass m = l/cb 2 . 
The free propagator G F (F(b, T), N\E, 0) is straightforward: 

G F (F(b, T),N\E, 0) = (2TTcb 2 N)- 1 / 2 C -( F (b,T)-F(N)) 2 /2c^N (10) 

where F(N e ) is again the mean-field potential. 

We have been unable to obtain an exact, closed form solution for the non-separable, "time-dependent" 
problem with boundaries. However we give in the appendix several analytical solutions in some reasonable 
limiting cases, and solve the problem numerically in general. 

There are polynomially many nucleation sites for a contiguous nucleus, which can be estimated geometri- 
cally. We equate this with the number of possible ways or routes N R to escape from a trap. If any of these 
nucleation sites are buffed to F^, the trap is then assumed to be buffed to F^. We assume independent 
buffing probabilities for each of the distinct routes. Then the probability that n of the routes are buffed is 
simply a binomial distribution for n events, the probability of each of which is <j>(F^) given in equation (8). 
Then the probability that at least one route is buffed to F^, i.e. the probability that a given trap of energy 
E is buffed, is p B (F*\E,b) = 1 - (1 - 4>(F^)) Nr . 

Density of states and density of traps. To determine if a sequence, rather than just a specific trap, 
is buffed we need to know how many candidate traps have energy E. We estimate the number of candidate 
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traps using the mean field calculation for a "typical" sequence. Then the effects of fluctuations sequence to 
sequence can be investigated. 

A configurational state is a local trap if all states kinctically connected to it by a single local move are 
higher in energy. Since the number of states connected to a given state equals Nv when each residue can 
move to v other states on average, there is a polynomial fraction of traps on uncorrelated landscapes [14], 
/ T w l/Nv. On a correlated landscape however, there is an exponentially small fraction of traps. To see 
this, let the conditional probability that a state has energy E' given that it shares a fraction of q contacts 
with a state of energy E be P q (E'\E). Then the fraction of states at energy E that are traps is given by 



ME)= dE> P q {E'\E) 



Nv 

(11) 



with q suitably chosen as below. 

For a Hamiltonian consisting of interacting sets of p contacts, the conditional probability P q (E'\E) obeys 
a Gaussian distribution centered on q p E with variance 2B 2 (1 — q 2p ) [15]. Thus when q — > 1, P q (E'\E) — > 
S (E' — E), and when q — > 0, or p — > oo, the states become uncorrelated and P q (E'\E) — > P(E'). When 
p = 1, the landscape statistics of a two-body Hamiltonian are recovered. The more "many-body" the 
Hamiltonian is, the more de-correlated states of a given structural similarity are. 

Since we are looking at states connected by single kinetic moves, we take E' = E + SE and q = 1 — Sq, 
with 5E ~ O(N) but small compared to E, and Sq <~ 0(N^ 1 ), since we envision local moves of an intensive 
number of residues. Then the distribution of SE becomes a Gaussian with mean pE Sq and variance 4B 2 p Sq, 
and the fraction of traps becomes f T (E) = [(1/2) erfc(\/p Sq E/2B)] Nu where erfc(x) is the complementary 
error function. Thus / T is of the form of a fraction raised to a large power. It is not significant until 
the argument or erfc (which is intensive) is fairly large and negative. Then the error function may be 
asymptotically expanded around one to yield 

f T (E) ~e~ c W N (12) 

where c(E) = v(irV2 P 6qE 2 /B 2 )- 1 / 2 exp{-pSqE 2 /2V2B 2 ) is a function of an intensive argument. Retracing 
the steps in the above argument for an uncorrelated landscape (or letting p — > oo) results in nearly every 
state constituting a trap, consistent with previous results. 

The total number of traps fl T (E,b) at energy E is then the total number of states f2 times the fraction 
P{E) of those states at energy E, times the fraction f T (E) of those states that are traps. A plot of the log 
number of states and log number of traps is shown in figure 2. 

The probability that all the traps at energy E will be buffed to is 

P B (F^,E,b) =p B (F*\E,b) n ^ E V , (13) 

assuming independent buffing probabilities for each trap. This assumption is likely to underestimate P B . On 
the other hand we have assumed independent probabilities for each of the N n routes out of a given trap to 
be buffed. This overestimates the value of p B , bringing it closer to unity. 

As energy increases above that of the ground state, the quantity P B rapidly approaches a value of unity. 
It is hardest to buff out the lowest energies, even considering that there are more traps at higher energies 
that must be buffed out. The fraction of sequences with available traps buffed at all energies that may be 
thermally occupied is 



r M 

P B (F^,b) = T[P B (F*,E,b) =cxp / dEQ T {E,b)ln(p B (F*\E,b)) . (14) 



Sequences with buffed landscapes. Let the system have non-extensive gap A, and be rugged enough 
so that the ground state is thermodynamically stable at the temperature of the earth, b > b G in (6) (Fig. 3 
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shows a schematic of buffing between low energy states). Note that all the thermodynamic properties of 
buffed landscapes are still the same as for random sequences. For example, configurational states still have 
variances in energies, so there is little or no thermal entropy left in the system when b > b G . It is the kinetic 
properties of the system which are different. Because the ground state is competing with polynomially other 
low energy states, its Boltzmann weight is given by w N s=s (1 + c'N e _A / T ) _1 where c' is a constant of 0(1), 
and 7 is an exponent less than one. So even for a non-extensive gap A ~ 0(N 1 / 2 ), the system will still have 
large Boltzmann weight in the ground state. 

How rare are these buffed sequences, as system size grows larger? Using eq. (8) and the results from the 
Appendix, we find an asymptotic expression for the fraction of surviving paths. To leading order 

cj) ~ JVV2 e -Si^ N N . (15) 

The prefactor arises from the free propagator. In the exponent, the first term is a diffusion term that cor- 
responds physically to the fact that buffed sequences are rare because of the likelihood of the specific free 
energy profile to fluctuate outside of after N steps. The second term in the exponent comes from the 

forcing term in the diffusion equation and embodies the fact that buffed sequences are rare because typical 
sequences follow the average free energy profile. However, perhaps unexpectedly, the forcing term is subdom- 
inant, scaling more weakly with N (recall z is between 1/2 and 2/3 since the nucleus is roughened). Since 
4> is exponentially small, the probability a trap is buffed over any route (in the expression for p B {F^\E, b) 
above) has only polynomial corrections: p B ~ N<j>. 

Because it is hardest to buff out the lowest energies, a sequence can be said to be buffed when a band of 
energies starting at the ground state and scaling polynomially with system size have all traps buffed to . 
Buffing out polynomially many competing ground states gives from (14) P B w p B e aN with a < 1. This 
again modifies the scaling by only by subdominant corrections. From cq. (15), more rugged sequences are 
harder to buff. The fraction of buffed sequences stable in their ground state at biological temperatures is thus 
dominated by those sequences having b 2 = b G in eq. (6). The fraction of sequences P R having this ruggedness 
is given through eq. (3). The total fraction of buffed sequences P B uff is then given by P R (b G )P B (F^ ,b G ) or 

P BUFF ~ c ^ ^ (16) 

For numerical values of the pair interaction energies such as those in the Miyazawa-Jernigan matrix, and 
naturally occurring amino acid abundances, sequence-to-sequence fluctuations of the barrier (second term in 
the exponent in eq. (16)) dominate the possibility of finding a buffed foldablc sequence, with the probability 
of finding a sufficiently rugged composition playing a secondary role. 

From eq.s (4) and (16), we see that the relative rareness of buffed vs. funneled sequences becomes a 
quantitative issue. We address this in figure 4, where the fraction of sequences buffed to = Ak B T is 
plotted as a function of chain length N, as well as the fraction of sequences sufficiently gapped so that the 
forward folding barrier is 4k B T. The folding barrier is determined from a capillarity model, however the 
gap must still scale extensively with chain length to ensure a constant ratio of T F /T G . There is a crossover 
in the figure beyond which it becomes easier to funnel. However for short sequences, as well as for longer 
sequences with degrees of freedom removed (due e.g. to native secondary or tertiary structure formation), 
buffing becomes more likely. 

Discussion. With increasing chain length it becomes exponentially rare for sequences to be significantly 
buffed. Likewise it is exponentially difficult to be funneled, so the question is which exponential wins. Our 
estimates suggest that funneling overall is the dominant mechanism at least for larger proteins which fold 
on biological time scales. Still the buffing mechanism raises some interesting practical questions. 

Is it possible to design and synthesize a "buffed foldable protein" ? Despite their rareness, buffed proteins 
probably exist and would be interesting objects, since they would possess a polynomial number of accessible 
states that could act as memories. Unfortunately, unlike funneled landscapes, the definition of buffed land- 
scapes does not immediately lead to a design algorithm. We note that only recently has the funneling recipe 
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been explicitly used in achieving foldability de novo in the laboratory [16]. Still, combinatorial strategies 
are possible. Although natural protein landscapes arc most likely funneled, the buffing mechanism may 
play a partial role in folding kinetics. Once a significant part of the protein is at least locally fixed by the 
tunneling mechanism, the entropy is reduced to a level where buffing can occur. This can act to remove 
specific intermediates or high energy transition states on the landscape. 

Other mechanisms in the spirit of buffing may exist which can facilitate folding. One can envision "anti- 
buffed" or "gated" landscapes which have regions of phase space closed off by high energy barriers, so that 
the system folds faster by exploring a smaller region of phase space, and avoiding regions which may contain 
deep traps. The fraction of sequences having gated landscapes may be worked out in a similar fashion as 
the fraction of buffed landscapes, and is a topic for future work. Such a corralling mechanism on a funneled 
landscape is one way of describing the gatekeeper residues discovered by Otzen and Oliveberg in S6 [17]. 
Gating is similar in spirit to Levinthal's original resolution to the kinetic paradoxes in protein folding [18], 
as well as issues which arise when considering kinetic partitioning mechanisms [19]. Gating may also play a 
role in avoiding misfolded protein structures responsible for aggregation-related diseases. 

Funncling is a sufficient and likely but not necessary condition for achieving foldability. However the above 
folding mechanisms are not mutually exclusive- funneled proteins may also exhibit buffing. The extent to 
which buffing occurs in real proteins may be tested for example by looking at the mutational sensitivity 
of the weights of non-native transient folding intermediates. Two candidates for these studies are the a- 
helical early folding intermediate in /3-lactalbumin [20], and the major kinetic traps in a-lactalbumin, whose 
populations may be measured by disulfide scrambling [21]. 

The ways evolution has assisted the folding and function of proteins are manifold- we have described an 
alternative to tunneling here. It will be interesting to see what other organizing principles of energy land- 
scapes are important in describing structural and functional biology. 

Acknowledgments: S.S.P. acknowledges funding from NSERC and the Canada Research Chairs (CRC) 
program. P.G.W. acknowledges funding from NIH grant R01-GM44557. 

Appendix: Solutions for Green's function for buffing. In this section we find approximate 
solutions to eq. (9), in order to derive analytical expressions for <p(F^) in eq. (8). We are interested here 
in finding the fraction of sequences whose kinetic barriers are below F^, where F^ <~ 0(k B T). The wave 
packet solution to (9) spreads out diffusively to width F^ in a number of steps N* ~ F^ 2 /b 2 . But since 
both F^ and b are ~ 0(k B T), N* <~ 0(1). Hence we expect that by N steps the wave function will 
have long since relaxed to the ground state. This also means that the 'time'-rate of change of the drift 
term in eq. (9) is slow compared to the rate at which the wave packet relaxes. The adiabatic parameter is 

given here by | (2\dH/dN e \l) /(E 2 — Ei) 2 \, where the numerator is the matrix element of F(N e ) (dG/dF) 
between the ground and first excited eigenfunctions of the unperturbed Hamiltonian, and the denominator 
is the difference in the first excited and ground state eigenvalues. A straightforward calculation gives the 
parameter as L 3 cr/TT 4 b 4 N 2 ^ z (where L = F„ AX is given above eq. (8) and is of order a few k B T). This 
parameter is much smaller than unity over the whole range of N e for all energies contributing to barriers to 
be buffed. Hence we conclude that here the adiabatic approximation is a good one. Then the time-dependent 

prefactor F(N e ) in (9) may then be treated as a constant parameter, and the solution then given in terms 
of that parameter. 

The adiabatic approximation is a special case of a more general transformation. We may eliminate 

the drift term in (9) by letting G(F, N e ) = K(F, N e ) exp( 5 (F, N e )), with g{F,N e ) = f{N e )F/(cb 2 ) - 

(l/2cfe 2 ) ^"dN' e F{N' e ) 2 . This yields a new differential equation for K{F,N e ) with a time and position- 
dependent sink term: 
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d Ne K(F, N e ) - (cb 2 /2)d FF K(F, N e ) + ((F(N e )/cb 2 )F + V(F(N e )\F*)) K(F, N e ) = 8(N e )8(F - E) lim e"^ 

(17) 

Since in the adiabatic approximation, the coefficient of the drift term could be treated as a constant, the 
first term in the parentheses in (17) can be neglected, since it is the time-derivative of the drift coefficient. 
Then for N e > 0, K(F,N e ) satisfies a simple diffusion equation with absorbing walls, so that K(F(b,T) — 
F*, N e ) = K(E + F*, N e ) = 0, and the initial condition is K(F, N e ) = 5(F - E). Thus the expansion in 
eigenfunctions may readily be written down. As we mentioned above, ground state dominance is an excellent 
approximation for N e > 1, so the solution is then given by the ground state term in the eigenfunction 
expansion: 

G(F, N e ) £* (2/L) e^^"^ sin(7r(£ + 5F*)/L) sin(7r(F + 5F*)/L) c 

where SF^ = F* - F(b, T). The propagator to (F(b, T), N) is then G(F(b, T), N). The fraction of surviving 
paths is then given by <p(F^) in (8) along with eq. (18) and the free propagator in (10). 

The iV-scaling of the fraction of buffed sequences may be checked with an exactly solvable model. For near 
ground states buffed to F^, the width L = 2F^ does not scale with N, and so can be replaced by a parabolic 
potential of a fixed, effective width. Then V(F(N e )\F^) in cq. (17) is replaced by ^iv 2 F 2 /cb 2 , where the 
effective frequency u> determines the spring stiffness. Here the differential equation (17) is equivalent to a 
Schrodinger equation with % = 1, time t replaced by —iN e , position replaced by free energy F, and particle 
mass replaced by 1/cb 2 . Shifting coordinates so the effective harmonic well is centered at zero, the solution 
to eq. (17) may be obtained exactly by integrating the Lagrangian over all paths [22]. The result is: 

K BO (F,N e \F o ,0) = (lj- 1 2Trcb 2 smh(uN e ))~ 1/2 e Sc ^ F > N °\ F °'V (19) 

where in the quantum mechanical analogy S CL represents the (extremal) action along the classical path. 
Calculating this for large N shows that, analogous to the "forcing term" we saw before, the classical action 
S C l(F, N\F a , 0) does not dominate the scaling behavior. Instead the prefactor, measuring deviations from 
the path of least action, plays the most important role, scaling as ~ e _a,Af / 2 for large N. Following the 
same arguments as before, this is essentially the scaling law for the fraction of buffed sequences. Comparing 
the scaling law here with that in eq. (16), we see that the effective frequency u is such that the unforced 
propagators in the hard-wall and parabolic-wall cases have the same survival probability after N steps. 

Figure Captions 

FIG. 1 Nucleation free energy profiles in trap escape, as a function of the number of escaped residues N e . Shown 
are the mean free energy profile (dashed line) , and a typical profile for sequences constrained to have barriers smaller 
than E + — (F(b, T) - F*). Also shown schematically is the propagator G(F,N e ) for various values of N e . The 
system rapidly relaxes to ground state wavefunction after 0(1) steps (see text). 

FIG. 2 Long dashed line: The log number of states vs. energy. Solid line: Approximate log number of traps 
from equation (12). Short dashed line: Log number of traps for the error function expression of f T in the text above 
eq. (12). The approximation is accurate for energies near the ground state, but falls off too rapidly for energies near 
zero. There are still traps at E = because states may have positive energies as well. Estimates are used here for 
the number of states per residue (Q = e JVx2 ), system size (N — 100), and number of connected states [y = 1). Pair 
interactions are taken (p = 1), energies are in units of -\/~NB, and Sq = 1/N. 

FIG. 2 Schematic of the energy landscape for a sequence with buffed ground states, projected onto a configu- 
rational coordinate. The barriers between the ground states are reduced to F^ , but the overall variance of energy 
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between states is not reduced. The lowest kinetic barriers between the states determines the escape rate. Along the 
coordinate(s) where the landscape is buffed, the kinetic barriers between states are reduced. 

FIG. 4 The fraction of foldable sequences, on a log scale, as a function of chain length N. Both tunneling 
and buffing mechanisms are shown here. Dashed line: Fraction of funneled sequences with forward folding barrier 
F* = 4k B T, and T F /T G = 1.6. Solid line: Fraction of sequences buffed to = 4fc B T. The adiabatic approximation 
for the Green's function in eq. (9) is used (see appendix). The crossover from buffing to funneling suggests the pos- 
sibility of a compound mechanism for generating a functional protein sequence. The funneling mechanism removes 
most of the entropy while guiding the protein to a smaller ensemble of similar structures. Then for this reduced 
collection of states, dynamics between individual traps is most likely mediated by buffing. This dynamics may be 
related to functionally important motions in proteins [1]. For funneled sequences we assume no ruggedness selection 
occurs. We scale the MJ interaction parameters so that in our units T F /T G « 1.6, a common value taken from the 
literature [23]. Buffed sequences must be selected to be sufficiently rugged to be stable at biological temperatures, 
and must have low kinetic barriers to be accessible on biological time scales. (INSET) Comparison of the fraction 
of buffed sequences from the adiabatic approximation and from full numerical solution to eq. (9). Several values of 
ruggedness are plotted. Circle: b=1.255, Triangle: b=1.4, Square: b=2.0. 
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